Robust in-car spelling recognition - a tandem BLSTM-HMM approach
نویسندگان
چکیده
As an intuitive hands-free input modality automatic spelling recognition is especially useful for in-car human-machine interfaces. However, for today’s speech recognition engines it is extremely challenging to cope with similar sounding spelling speech sequences in the presence of noises such as the driving noise inside a car. Thus, we propose a novel Tandem spelling recogniser, combining a Hidden Markov Model (HMM) with a discriminatively trained bidirectional Long Short-Term Memory (BLSTM) recurrent neural net. The BLSTM network captures long-range temporal dependencies to learn the properties of in-car noise, which makes the Tandem BLSTM-HMM robust with respect to speech signal disturbances at extremely low signal-to-noise ratios and mismatches between training and test noise conditions. Experiments considering various driving conditions reveal that our Tandem recogniser outperforms a conventional HMM by up to 33%.
منابع مشابه
Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory
This article proposes and evaluates various methods to integrate the concept of bidirectional Long Short-Term Memory (BLSTM) temporal context modeling into a system for automatic speech recognition (ASR) in noisy and reverberated environments. Building on recent advances in Long Short-Term Memory architectures for ASR, we design a novel front-end for contextsensitive Tandem feature extraction a...
متن کاملHybrid HMM/BLSTM-RNN for Robust Speech Recognition
The question how to integrate information from different sources in speech decoding is still only partially solved (layered architecture versus integrated search). We investigate the optimal integration of information from Artificial Neural Nets in a speech decoding scheme based on a Dynamic Bayesian Network for noise robust ASR. A HMM implemented by the DBN cooperates with a novel Recurrent Ne...
متن کاملLong short-term memory networks for noise robust speech recognition
In this paper we introduce a novel hybrid model architecture for speech recognition and investigate its noise robustness on the Aurora 2 database. Our model is composed of a bidirectional Long Short-Term Memory (BLSTM) recurrent neural net exploiting long-range context information for phoneme prediction and a Dynamic Bayesian Network (DBN) for decoding. The DBN is able to learn pronunciation va...
متن کاملBLSTM-RNN Based 3D Gesture Classification
This paper presents a new robust method for inertial MEM (MicroElectroMechanical systems) 3D gesture recognition. The linear acceleration and the angular velocity, respectively provided by the accelerometer and the gyrometer, are sampled in time resulting in 6D values at each time step which are used as inputs for the gesture recognition system. We propose to build a system based on Bidirection...
متن کاملAlert correlation and prediction using data mining and HMM
Intrusion Detection Systems (IDSs) are security tools widely used in computer networks. While they seem to be promising technologies, they pose some serious drawbacks: When utilized in large and high traffic networks, IDSs generate high volumes of low-level alerts which are hardly manageable. Accordingly, there emerged a recent track of security research, focused on alert correlation, which ext...
متن کامل